Publicly Available Topic Signatures for all WordNet Nominal Senses
نویسندگان
چکیده
Topic signatures are context vectors built for word senses and concepts. They can be automatically acquired from the web for any concept hierarchy using the “monosemous relative” method. Topic signatures have been shown to be useful in Word Sense Disambiguation, for modeling similarity between word senses, classifying new terms in hierarchies and also building hierarchical clusters of word senses for a given word. In this work we present a publicly available resource which comprises both automatically extracted examples for all WordNet 1.6 noun senses and topic signatures built based on those examples. We gathered around 700 sentences per each noun in WordNet. When the monosemous relatives are used to build a sense corpus for polysemous words, they comprise an average of around 3,500 sentences per word sense. The size of the topic signatures thus constructed is of around 4,500 words per word sense.
منابع مشابه
Approximating Hierarchy-Based Similarity for WordNet Nominal Synsets using Topic Signatures
Topic signatures are context vectors built for concepts. They can be automatically acquired for any concept hierarchy using simple methods. This paper explores the correlation between a distributional-based semantic similarity based on topic signatures and several hierarchy-based similarities. We show that topic signatures can be used to approximate link distance in WordNet (0.88 correlation), ...
متن کاملEnriching very large ontologies using the WWW
This paper explores the possibility to exploit text on the world wide web in order to enrich the concepts in existing ontologies. First, a method to retrieve documents from the WWW related to a concept is described. These document collections are used 1) to construct topic signatures (lists of topically related words) for each concept in WordNet, and 2) to build hierarchical clusters of the con...
متن کاملExploring Knowledge Bases for Similarity
Graph-based similarity over WordNet has been previously shown to perform very well on word similarity. This paper presents a study of the performance of such a graph-based algorithm when using different relations and versions of Wordnet. The graph algorithm is based on Personalized PageRank, a random-walk based algorithm which computes the probability of a random-walk initiated in the target wo...
متن کاملHighlighting relevant concepts from Topic Signatures
This paper presents deepKnowNet, a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method applies a knowledge-based Word Sense Disambiguation algorithm to assign the most appropriate WordNet sense to large sets of topically related words acquired from the web, named TSWEB. This Word Sense Disambiguation algorithm...
متن کاملA sense-based lexicon of count and mass expressions: The Bochum English Countability Lexicon
The present paper describes the current release of the Bochum English Countability Lexicon (BECL 2.1), a large empirical database consisting of lemmata from Open ANC (http://www.anc.org) with added senses from WordNet (Fellbaum, 1998). BECL 2.1 contains ≈ 11,800 annotated noun-sense pairs, divided into four major countability classes and 18 fine-grained subclasses. In the current version, BECL ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004